SRS: Solving c-Approximate Nearest Neighbor Queries in High Dimensional Euclidean Space with a Tiny Index

نویسندگان

  • Yifang Sun
  • Wei Wang
  • Jianbin Qin
  • Ying Zhang
  • Xuemin Lin
چکیده

Nearest neighbor searches in high-dimensional space have many important applications in domains such as data mining, and multimedia databases. The problem is challenging due to the phenomenon called “curse of dimensionality”. An alternative solution is to consider algorithms that returns a c-approximate nearest neighbor (c-ANN) with guaranteed probabilities. Locality Sensitive Hashing (LSH) is among the most widely adopted method, and it achieves high efficiency both in theory and practice. However, it is known to require an extremely high amount of space for indexing, hence limiting its scalability. In this paper, we propose several surprisingly simple methods to answer c-ANN queries with theoretical guarantees requiring only a single tiny index. Our methods are highly flexible and support a variety of functionalities, such as finding the exact nearest neighbor with any given probability. In the experiment, our methods demonstrate superior performance against the state-of-the-art LSH-based methods, and scale up well to 1 billion high-dimensional points on a single commodity PC.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Analysis of a Probabilistic Approach to Nearest Neighbor Searching

Given a set S of n data points in some metric space. Given a query point q in this space, a nearest neighbor query asks for the nearest point of S to q. Throughout we will assume that the space is real d-dimensional space <d, and the metric is Euclidean distance. The goal is to preprocess S into a data structure so that such queries can be answered efficiently. Nearest neighbor searching has ap...

متن کامل

SC-LSH: An Efficient Indexing Method for Approximate Similarity Search in High Dimensional Space

Locality Sensitive Hashing (LSH) is one of the most promising techniques for solving nearest neighbour search problem in high dimensional space. Euclidean LSH is the most popular variation of LSH that has been successfully applied in many multimedia applications. However, the Euclidean LSH presents limitations that affect structure and query performances. The main limitation of the Euclidean LS...

متن کامل

Approximate Nearest Neighbors Search Without False Negatives For l_2 For c>sqrt{loglog{n}}

In this paper, we report progress on answering the open problem presented by Pagh [11], who considered the nearest neighbor search without false negatives for the Hamming distance. We show new data structures for solving the c-approximate nearest neighbors problem without false negatives for Euclidean high dimensional space R. These data structures work for any c = ω( √ log logn), where n is th...

متن کامل

Approximate Nearest Neighbor Queries in Fixed Dimensions 1

Given a set of n points in d-dimensional Euclidean space, S E d , and a query point q 2 E d , we wish to determine the nearest neighbor of q, that is, the point of S whose Euclidean distance to q is minimum. The goal is to preprocess the point set S, such that queries can be answered as eeciently as possible. We assume that the dimension d is a constant independent of n. Although reasonably goo...

متن کامل

Approximate Nearest Neighbor Search Amid Higher-Dimensional Flats

We consider the approximate nearest neighbor (ANN) problem where the input set consists of n k-flats in the Euclidean R, for any fixed parameters 0 ≤ k < d, and where, for each query point q, we want to return an input flat whose distance from q is at most (1 + ε) times the shortest such distance, where ε > 0 is another prespecified parameter. We present an algorithm that achieves this task wit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2014